摘要 :
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to ...
展开
In the current context of Big Data, a multitude of new NoSQL solutions for storing, managing, and extracting information and patterns from semi-structured data have been proposed and implemented. These solutions were developed to relieve the issue of rigid data structures present in relational databases, by introducing semi-structured and flexible schema design. As current data generated by different sources and devices, especially from IoT sensors and actuators, use either XML or JSON format, depending on the application, database technologies that store and query semi-structured data in XML format are needed. Thus, Native XML Databases, which were initially designed to manipulate XML data using standardized querying languages, i.e., XQuery and XPath, were rebranded as NoSQL Document Oriented Databases Systems. Currently, the majority of these solutions have been replaced with the more modern JSON based Database Management Systems. However, we believe that XML-based solutions can still deliver performance in executing complex queries on heterogeneous collections. Unfortunately nowadays, research lacks a clear comparison of the scalability and performance for database technologies that store and query documents in XML versus the more modern JSON format. Moreover, to the best of our knowledge, there are no Big Data-compliant benchmarks for such database technologies. In this paper, we present a comparison for selected Document-Oriented Database Systems that either use the XML format to encode documents, i.e., BaseX, eXist-db, and Sedna, or the JSON format, i.e., MongoDB, CouchDB, and Couchbase. To underline the performance differences we also propose a benchmark that uses a heterogeneous complex schema on a large DBLP corpus. (C) 2021 The Authors. Published by Elsevier Inc.
收起
摘要 :
The tasks of configuring and tuning large database management systems (DBMSs) have always been both complex and time-consuming. They require knowledge of the characteristics of the system, the data, and the workload, and of the in...
展开
The tasks of configuring and tuning large database management systems (DBMSs) have always been both complex and time-consuming. They require knowledge of the characteristics of the system, the data, and the workload, and of the interrelationships between them. The increasing diversity of the data and the workloads handled by today systems is making manual tuning by database administrators almost impossible. Self-tuning DBMSs, which dynamically reallocate resources in response to changes in their workload in order to maintain predefined levels of performance, are one approach to handling the tuning problem. In this paper, we apply self-tuning technology to managing the buffer pools, which are a key resource in a DBIVE. Tuning the size of the buffer pools to a workload is crucial to achieving good performance. We describe a Buffer Pool Tuning Wizard that can be used by database administrators to determine effective buffer pool sizes. The wizard is based on a self-tuning algorithm called the Dynamic Reconfiguration algorithm (DRF), which uses the principle of goal-oriented resource management. It is an iterative algorithm that uses greedy heuristics to find a reallocation that benefits a target transaction class. We define and motivate the cost estimate equations used in the algorithm. Represent the results of a set of experiments to investigate the performance of the algorithm.
收起
摘要 :
The authors address the problem of providing a homogeneous framework for integrating, in a database environment, active rules, which allow the specification of actions to be executed whenever certain events take place, and deducti...
展开
The authors address the problem of providing a homogeneous framework for integrating, in a database environment, active rules, which allow the specification of actions to be executed whenever certain events take place, and deductive rules, which allow the specification of deductions in a logic programming style. Actually, it is widely recognized that both kinds of rules enhance the capabilities of database systems since they provide very natural mechanisms for the management of various important activities (e.g., knowledge representation, complex data manipulation, integrity constraint enforcement, view maintenance). However, in spite of their strong relationship, little work has been done on the unification of these powerful paradigms. They present a rule-based language with an event-driven semantics that allows programmers to express both active and deductive computations. The language is based on a new notion of production rules whose effect is both a change of state and an answer to a query. By using several examples, they show that this simple language schema allows one to uniformly define different computations on data, including complex data manipulations, deductive evaluations, and active rule processing. They define the semantics of the language and then describe the architecture of a preliminary implementation of the language. Finally, they report on the application and experience of using the language.
收起
摘要 :
As the size of the databases containing personal data is expanding very fast worldwide, the mass collection and processing of personal data has raised a lot of concerns about the manner in which the personal data of an individual ...
展开
As the size of the databases containing personal data is expanding very fast worldwide, the mass collection and processing of personal data has raised a lot of concerns about the manner in which the personal data of an individual are processed. In an effort to address privacy concerns, the European Parliament adopted the Data Protection Directive, which enforces organisations to take steps to ensure their compliance. Current database technology fails to allow organisations to comply with the requirements of the new data protection legislation. In this paper, a complete set of the DBMS operability requirements is presented, in order to support the EU Directive. These requirements affect the database facilities to identify individuals and for audit trail, the security and processing mechanisms of the DBMSs, and the kind of data that needs to be stored. An implementation model is also proposed.
收起
摘要 :
Several elements of the process safety management regulation (PSM) require tracking and documentation of actions; process hazard analyses, management of change, process safety information, operating procedures, training, contracto...
展开
Several elements of the process safety management regulation (PSM) require tracking and documentation of actions; process hazard analyses, management of change, process safety information, operating procedures, training, contractor safety programs, pre-startup safety reviews, incident investigations, emergency planning, and compliance audits. These elements can result in hundreds of actions annually that require actions. This tracking and documentation commonly is a failing identified in compliance audits, and is difficult to manage through action lists, spreadsheets, or other tools that are comfortably manipulated by plant personnel. This paper discusses the recent implementation of a database management system at a chemical plant and chronicles the improvements accomplished through the introduction of a customized system. The system as implemented modeled the normal plant workflows, and provided simple, recognizable user interfaces for ease of use.
收起
摘要 :
The problems in building a transaction processing system are discussed, and it is shown that the difficulties are a function of specific attributes of the underlying database system. A model of a transaction processing system is p...
展开
The problems in building a transaction processing system are discussed, and it is shown that the difficulties are a function of specific attributes of the underlying database system. A model of a transaction processing system is presented, and five system dimensions important in classifying transaction processing systems-the process, machine, heterogeneity, data, and site components-are introduced. The specific problems posed by various combinations of system characteristics are analyzed. The evolution of transaction processing systems are described in terms of the framework.
收起
摘要 :
Purpose - Today's database management systems implement sophisticated access control mechanisms to prevent unauthorized access and modifications. For instance, this is an important basic requirement for SOX (Sarbanes-Oxley Act) co...
展开
Purpose - Today's database management systems implement sophisticated access control mechanisms to prevent unauthorized access and modifications. For instance, this is an important basic requirement for SOX (Sarbanes-Oxley Act) compliance, whereby every past transaction has to be traceable at any time. However, malicious database administrators may still be able to bypass the security mechanisms in order to make hidden modifications to the database. This paper aims to address these issues. Design/methodology/approach - In this paper the authors define a novel signature of a B+-tree, a widely-used storage structure in database management systems, and propose its utilization for supporting the logging in databases. This additional logging mechanism is especially useful in conjunction with forensic techniques that directly target the underlying tree-structure of an index. Several techniques for applying this signature in the context of digital forensics on B+-trees are proposed in the course of this paper. Furthermore, the authors' signature can be used to generate exact copies of an index for backup purposes, thereby enabling the owner to completely restore data, even on the structural level. Findings - For database systems in enterprise environments, compliance to regulatory standards such as SOX (Sarbanes-Oxley Act), whereby every past transaction has to be traceable at any time, is a fundamental requirement. Today's database management systems usually implement sophisticated access control mechanisms to prevent unauthorized access and modifications. Nonetheless malicious database administrators would be able to bypass the security mechanisms in order to make modifications to the database, while covering their tracks. Originality/value - In this paper, the authors demonstrate how the tree structure of the underlying store engine can be used to enhance forensic logging mechanisms of the database. They define a novel signature for B+-trees, which are used by the InnoDB storage engine. This signature stores the structure of database storage files and can help in reconstructing previous versions of the file for forensic purposes. Furthermore, the authors' signature can be used to generate exact copies of an index for backup purposes, thus enabling the owner to completely restore data, even on the structural level. The authors applied their concept to four real-life scenarios in order to evaluate its effectiveness.
收起
摘要 :
In many application areas, the access pattern is navigational and a large fraction of the accesses are perfect match accesses/queries on one or more words in text strings in the objects. One example of such an application area is ...
展开
In many application areas, the access pattern is navigational and a large fraction of the accesses are perfect match accesses/queries on one or more words in text strings in the objects. One example of such an application area is XML data stored in object database systems. Such systems will frequently store large amounts of data, and in order to provide the necessary computing power and data bandwidth, a parallel system based on a shared-nothing architecture can be necessary. In this paper, we describe how the signature cache approach can significantly reduce the average object access cost in parallel object database systems.
收起
摘要 :
Database system is the infrastructure of the modern information system. The R & D in the database system and its technologies is one of the important research topics in the field. The database R & D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results....
展开
Database system is the infrastructure of the modern information system. The R & D in the database system and its technologies is one of the important research topics in the field. The database R & D in China took off later but it moves along by giant steps. This report presents the achievements Renmin University of China (RUC) has made in the past 25 years and at the same time addresses some of the research projects we, RUC, are currently working on. The National Natural Science Foundation of China supports and initiates most of our research projects and these successfully conducted projects have produced fruitful results.
收起
摘要 :
The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi-structured data,...
展开
The variety of data is one of the most challenging issues for the research and practice in data management systems. The data are naturally organized in different formats and models, including structured data, semi-structured data, and unstructured data. In this survey, we introduce the area of multi-model DBMSs that build a single database platform to manage multi-model data. Even though multi-model databases are a newly emerging area, in recent years, we have witnessed many database systems to embrace this category. We provide a general classification and multi-dimensional comparisons for the most popular multi-model databases. This comprehensive introduction on existing approaches and open problems, from the technique and application perspective, make this survey useful for motivating new multi-model database approaches, as well as serving as a technical reference for developing multi-model database applications.
收起